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The advent of geolocated ICT technologies opens the possibility of exploring how people use space 
in cities, bringing an important new tool for urban scientists and planners, especially for regions 
where data is scarce or not available. Here we apply a functional network approach to determine 
land use patterns from mobile phone records. The versatility of the method allows us to run a 
systematic comparison between Spanish cities of various sizes. The method detects four major land 
use types that correspond to different temporal patterns. The proportion of these types, their spatial 
organization and scaling show a strong similarity between all cities that breaks down at a very local 
scale, where land use mixing is specific to each urban area. Finally, we introduce a model inspired 
by Schelling’s segregation, able to explain and reproduce these results with simple interaction rules 
between different land uses. 


INTRODUCTION 

Land use patterns appear as a natural result of cit¬ 
izens’ and planners’ interaction with the urban space. 
However, in a feedback loop, they also play a ma¬ 
jor role in the experience that residents and visitors 
have of a city [1]. Land use patterns have an effect 
on the liveability of neighborhoods and even on the 
health of the local residents [2]. On the other hand, 
land use and transportation display a well-established 
relation [3-6]. Transport demand depends on the lo¬ 
cation of residence and business areas, while the pres¬ 
ence of new transport lines or facilities such as metro 
stations can substantially modify the land use mixing 
in a given area of the city. These ideas lie behind the 
development of the so-called Land Use Transport In¬ 
teraction (LUTI) models [7, 8], which are commonly 
employed in transport planning around the globe [9]. 

An important issue regarding land use refers to the 
methods employed to estimate it. City Hall regis¬ 
ters, surveys or satellite images have been used in 
the past to this end [10-1 ]. The emergence of geo¬ 
located ICT technologies introduces extra capabili¬ 
ties to directly measure the use that citizens make 
of each urban space. The information is exhaustive in 
terms of spatial and temporal resolution, allowing for 
the detection of concentrations of people second by 
second along days, weeks and months. Information 
from mobile phone call records [17-30], geolocated 
tweets [15, 31-34], credit card use [35] or Foursquare 
[22] has been considered in the literature. Different 
data sources have been compared, finding a consis¬ 
tent agreement among the estimations on human con¬ 
centrations and mobility obtained from different ICT 
data [27], as well as between ICT data and more tra¬ 
ditional techniques [21-24, 27, 29, 36]. 

Such wealth of information together with the ability 
to process massive data brought by the Internet era 


allows the systematic comparison of features across 
cities. This analysis can lead to the discovery and con¬ 
firmation of properties that have been hypothesized 
to be common to all cities, and also to laws provid¬ 
ing insights into the way a property scales with city 
size. Some examples of these properties include num¬ 
ber of patents filed, unemployment rates, GDP per 
capita, business diversity, consumption of resources, 
length of road networks, or even crime density [37- 
43]. The finding of these laws raises the hope of 
the existence of a coherent framework for city science 
[38, 39, 41, 42, 44-46]. 

In this work, we explore land use patterns in the five 
most populous urban areas of Spain. Land use infor¬ 
mation is obtained from mobile phone records using 
a new framework based on network theory and sys¬ 
tematic comparisons of land use distribution across 
the five cities are performed at different scales. Our 
results reveal common features in the land use types’ 
spatial distributions, which can be understood with 
a model introduced also here. The similarities break 
down when the land use type mixing is studied at very 
short spatial scales, exposing patterns characteristic 
to each city. 

MATERIALS AND METHODS 
A network approach to detect land use 

Our database is composed of aggregated and 
anonymized call records during 55 days between 
September and November 2009 in Spain. Every time 
a user receives or makes a call, the event is registered 
together with the tower (BTS) providing the service. 
The positions of the BTSs are geo-referenced and so 
the activity levels of each spatial area can be tracked in 
time. For this work, we select the five most populated 
metropolitan areas of Spain: Madrid (with a popula- 
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Figure 1. Steps of the method to detect land use. (A- 

B) The urban area is divided in cells of equal area. (C) For 
each cell, we calculate an activity profile in terms of phone 
calls along time during the days of the week. (D) A Pearson 
correlation matrix between cell activities is computed. Then 
the matrix formed by correlations over a threshold value <5 
is used to define an undirected weighted network (E), which 
is clusterized using community detection techniques and the 
results plotted again on the city map (F). 


tion over 5.5 million people), Barcelona (3.2 million), 
Valencia (1.5 million), Seville (980,000) and Bilbao 
(900,000). There is no unique definition for the bor¬ 
der of an urban area. It may refer, for instance, to 
official, census or economic delimitation of the cities. 
Since the focus here is on urban land use, we are inter¬ 
ested in identifying the inner zones of each city and, 
therefore, we use the definition of the metropolitan 
transportation offices: only areas served by metro or 
urban buses are considered. This is, nevertheless, an 
important question because the selection of borders 
may influence the scaling analysis when comparing 
across cities [42, 47]. 

The space of the urban areas is divided following 
a Voronoi tessellation with the BTS location as cen¬ 
ters. The extension of the areas served by each BTS is 
very different, since it depends on the expected peaks 
of demand. To ensure a common geographical frame¬ 
work, the five urban areas are divided in a grid with 
square cells of 500 x 500 m 2 to which the activity is 
mapped. This should prevent spurious effects due to 
the Voronoi areas heterogeneity (see the Appendix for 
a detailed description of the cities and the division 
process). 


The activity (number of users) in each cell is moni¬ 
tored in time and then processed as illustrated in Fig¬ 
ure 1. Average activity profiles are estimated over 
each day of the week hour by hour in every cell. These 
profiles are normalized by the total hourly activity 
to subtract the trends introduced by the circadian 
rhythms. A Pearson correlation coefficient is then cal¬ 
culated between the activities of every pair of cells, 
obtaining a correlation matrix describing the level of 
similarity between activity profiles. The correlations 
can take positive and negative values. Distributions 
of these values are shown in the Figure S4 of the Ap¬ 
pendix. In order to remove non-significant and nega¬ 
tive correlations we only consider Pearson correlation 
coefficients higher than a threshold S. As a result, we 
obtain one weighted network per urban area. We first 
note that variations of the threshold do not produce 
significant changes in the properties of the resulting 
network. The results in the main text refer to a value 
of S equal to the correlation distribution dispersion. 

Once the networks are built, their mesoscopic struc¬ 
ture is analyzed using clustering techniques. The 
main advantage of community detection algorithms in 
networks compared to more classical clustering tech¬ 
niques based on dissimilarity matrix is that the num¬ 
ber of clusters do not need to be fixed a priori. How¬ 
ever, it is important to note that different clustering 
methods can lead to distinct partitions of the net¬ 
works. We report next results obtained with Infomap 
[ t8], while a systematic comparison with results ob¬ 
tained with other clustering tools is provided in the 
Appendix. As mentioned previously, Infomap does 
not require the input of a predetermined number of 
clusters. Therefore, it is interesting to find that in 
the five cities, between 98 and 100% of the cells are 
covered with only 4 groups. Figure 2 shows how the 
activity looks like for each of these four clusters in 
Madrid (similar plots for Barcelona, Valencia, Seville 
and Bilbao are included as Figure S9 and S10). 

Each of the clusters can be associated with a main 
land use: 

1. Residential (red), which is characterized by 
low activities from 8am to 5 — 6pm. For the 
cells composing this group, the activity peaks 
around 7 — 8am and during the evening. In the 
weekend, the activity is almost constant except 
for the night hours; 

2. Business (blue), where the activity is signif¬ 
icantly higher during the weekdays than dur¬ 
ing the weekends. Furthermore, it concentrates 
from 9am to 6 — 7pm. This land use designation 
can be related to a wide range of commercial, 
retail, service and office uses; 

3. Logistics/Industry (cyan), where, as for 
Business, the activity is higher during the week¬ 
days. We observe a large peak between 5am and 
7am followed by a smaller peak around 3pm. 
This cluster can be related to transport and dis¬ 
tribution of goods: for example, ” Mercamadrid” 
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(the largest distribution area of Madrid) belong 
to this cluster; 

4. Nightlife (orange), which is characterized by 
high activity during the night hours (lam-4am), 
especially during the weekends. During the 
weekdays, these areas show higher activity be¬ 
tween 9am and 6pm, as for the Business cluster, 
which may be hinting a certain level of mixing in 
the land use. Some examples of this category are 
the ’’Gran Via” in Madrid and the ’’Ramblas” 
of Barcelona where abound theatres, restaurants 
and pubs mixed with offices and shops. This is 
typically the smallest cluster of the four in num¬ 
ber of cells. 
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Figure 2. Temporal patterns associated with the four 
clusters for the metropolitan area of Madrid. In red: Resi¬ 
dential cluster; In blue: Business; In cyan: Logistics/Industry; 
And in orange: Nightlife. 


More systematically, we compared the land use 
patterns obtained with our algorithm with cadastral 
data. The dataset contains information about land 
use for each cadastral parcel of the metropolitan area 
of Madrid and Barcelona (about 650,000 parcels). In 
particular, we have for each cadastral parcel the net 
internal area devoted to Residential, Business and In¬ 
dustrial uses. This information can be used to iden¬ 
tify the dominant cadastral land use in each grid cell 
classified as Residential, Business and Industrial uses 
by the community detection algorithm. To do so we 
need to define a rule to determine what is the dom¬ 
inant land use in a cell. Intuitively, one would tend 
to identify the dominant land use in a cell as the land 
use class with the largest area. However, the nature 
of both land use assignations is very different: the 
cadastral data is based on the net internal area offi¬ 
cially devoted to each activity and not necessarily on 


the number of people performing it. Therefore, Resi¬ 
dential use is the land use class with the largest area 
in most of the cell leading to an over-representation 
of Residential cells in the metropolitan area. To cir¬ 
cumvent this limitation we introduce two thresholds 
to identify Business and Logistics cells with cadastral 
data in order to obtain a distribution of the fraction 
of cells according to the land use type similar to the 
one obtained with our algorithm (see the Appendix for 
details). The overall agreement is high, we find a per¬ 
centage of correct predictions equal to 65% for Madrid 
and 60% for Barcelona which is consistent with values 
obtained in other studies, 54% in [21] and 58% in [23]. 
Furthermore, for both case studies, almost all land use 
types have a percentage of correct predictions higher 
than 50% (see the Appendix for details). 


RESULTS 


Comparison of cities 


Once defined the clusters, we can compare the pro¬ 
portion of land use type over the five case studies. 
Figure 3 shows the fraction of cells and the fraction of 
mobile phone activity averaged over the time period 
according to the land use patterns for each metropoli¬ 
tan area. We find similar results for the five case stud¬ 
ies, the Residential land use type represents in average 
about 40% of the cells and the mobile phone activity 
against 30% for the Business category and less than 
15% for the Logistics and Nightlife clusters. 


A B 
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□ Business 

□ Logistics 

□ Nightlife 


Figure 3. Fraction of cells (A) and mobile phone users 
(B) according to the type of land use for each case study. 

The fraction of mobile phone users is averaged over the 168 
values of the time period. 

We can also study how the cells in each cluster are 
organized in the city’s space. For the sake of compar¬ 
ison, we arbitrarily consider as city center the loca¬ 
tion of the City Hall and build a histogram with the 
number of cells at a certain distance from it. Since 
each city has a different spatial extension, distances 
are normalized by dividing by the maximal distance 
in each city so as to produce comparable results. The 
distributions are shown in Figure 4, where average 
curves over all cities have been superimposed. It is 
interesting to note certain similarity in the distribu¬ 
tion of cells for all urban areas. City size acts as a 
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Figure 4. Distribution of the distance between the cells and the City Hall according to the type of land use. The 

distance has been normalized by the maximum distance in each city. 


natural cutoff in the distributions, although no simple 
functional shape is found in any of the clusters. Res¬ 
idential cells are well distributed across the cities but 
with a maximum not very far from the center. Busi¬ 
ness cells appear at a similar distance as Residential 
but peaking a little further. Logistics and Industry 
are preferentially located in the periphery, while the 
Nightlife cells are well distributed along the urban ar¬ 
eas but slightly more concentrated in the central areas. 

In order to quantify land use distribution patterns, 
we use the Ripley’s K [49] defined as 

A n 

K ( r ) = ^2^2 N i( r )> (!) 

i 

where A is the city area, r the search radius (a geo¬ 
graphical scale), the index i runs over the cells in the 
urban area and n is the total number of cells. Ni(r ) 
stands for the number of cells of a given type within 
a distance r from the cell i. This indicator measures 
the spatial heterogeneity of a given type of cells. The 
baseline for homogeneous random systems is a growth 
K(r) = 7 T r 2 until reaching A. If the value of K{r) is 
over the random curve for a certain r it implies that 
the system is clusterized at that scale. Since cities 
have different sizes, both K{r) and the radius must 
be normalized by their maximum values (A for K(r) 
and the maximum distance for r). Curves for the nor¬ 
malized Ripley’s K for each city and land use type are 


displayed in Figure 5A as a function of the normalized 
radius. The K(r)/A for each city are always above the 
green curve corresponding to a random distribution of 
land use types, indicating coarsening of land use. We 
find a scaling-like curve for all the land use types with 
most of the cities following well the general trend with 
some small deviations for Nightlife in Seville. 

Deepening the analysis, we can also define an en¬ 
tropy index to characterize the land use spatial orga¬ 
nization. Let us consider a frame containing the full 
urban area, which is, in turn, sub-divided in a certain 
number D 2 of equal divisions. Each of these subdivi¬ 
sions, Bi , intersects the elementary cells so a certain 
fraction of area falls in each of the land use types: fj 3 
in the Residential cluster, f j 3 in Business, //' in Lo¬ 
gistics, and f t N in Nightlife. An entropy index, Ei, 
can be defined for Bi as 

= ( 2 ) 

a 

where a runs over the four clusters. The entropy 
Ei is then averaged over all the divisions to obtain a 
global metric for the city at a given scale E(D). E(D) 
tends to zero if the land use within the divisions be¬ 
comes unique, as occurs for instance at large D (small 
spatial scales). On the other extreme, when D —» 1 
, E(D) converges to a fixed value describing the full 
city. Figure 5B shows how the average entropy be- 
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Figure 5. Comparison of the observed and the simulated Ripley's K and average entropy index. (A) Ripley's K divided 
by the city area as a function of the search radius. The radius has been normalized with the maximum value in each urban 
area. (B) Average entropy index as function of the lateral number of divisions (inverse scale) D. The color and symbols of 
the curves represent different cities. The red curve corresponds to our model results and the green curve is the outcome of a 
random null model. Results for our model were obtained with a calibrated value of 7 = 0.8. The red and green curves display 
the average over 100 realizations. 


haves with D. The curves are similar across cities, 
recalling the shape of scaling functions. This is not 
surprising if the concept of a fractal-like distribution 
of the city activity applies as has been previously dis¬ 
cussed in the literature [37-42]. 

Modeling land use 


binations of types. Some land uses attract each other 
as, for instance, Residential and Business, while oth¬ 
ers repel as Residential and Logistics. The existence 
of rules of attraction and repulsion between different 
types of land use has already been considered in the 
past like for example in [53, 54]. To be specific if p\ 
is the fraction of neighbors of i of type t, then S, is 
calculated as 


Urban land use models in the literature are typically 
built with relative elaborated mechanisms [50, 51]. If 
basic in the rules, the models typically refer to char¬ 
acteristics of cities such as the population or activ¬ 
ity distributions [44-46]. The shape of E(D) can be 
explained, however, by a simple model inspired by 
Schelling’s segregation [52]. It is important to stress 
that this model is not intended to reproduce all the 
processes leading to the land use formation, but to 
explain the scaling of its spatial distribution patterns. 

The basic framework is a lattice in 2D representing 
the urban space. Initially, a variable t t with a land 
use type is assigned to every cell i at random (Res¬ 
idential R. Business B , Logistics L or Nightlife N). 
The global fraction of cells of each type respects the 
proportions found in the empirical data in such a way 
that E(D = 1) coincides with the observations. A sat¬ 
isfaction index, Si, is then defined per cell taking into 
account its type and those of its neighbors. Similarly 
to Schelling’s model, we assume that the satisfaction 
increases when a cell is surrounded by cells of its own 
type. Otherwise, Si depends on the particular com- 


if ti = L, 
if R = N, 

if ti = R, B , 


Si — 

Si = Pn 

tip* 0 with probability 7 , 

Si= ; 

Pr b 0 with probability 1 — 7 , 


where S PiX is the Kronecker delta (equal to one if 
p = x, zero otherwise) and 7 is the only model pa¬ 
rameter. Note that the first condition implies that for 
Logistic cells S 1 ,- = 1 only if they are surrounded by 
cells of the same type, and that cells of other types 
have zero satisfaction if surrounded by any Logistic 
one. With this rule, we introduce a tendency to lo¬ 
cate Industry and Logistics out of the core areas of the 
cities. Residential and Business cells have a certain 
tolerance to the R , B and N types with 7 acting as a 
mixing control parameter: if 7 = 0 , mixing is not fa¬ 
vored. A global satisfaction measure is defined as the 
sum over all the cell satisfaction indices, S = ]TL S', : . 
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Figure 6. Land use mixing. (A) Distribution of the Pearson correlations between cells activity and the average cluster 
profiles. (B) Map of Barcelona displaying the four clusters with the colors varying from white to the baseline according to the 
intensity of the relation with the assigned cluster of each cell. The color code is red for Residential, blue for Business, cyan 
for Logistics/Industry and orange for Nightlife. (C) Fraction of mixed cells as a function of the city population. (D) Fraction 
of cells classified by the type of land use mixing among those with two types of land use. 


The model is updated by choosing random pairs of 
cells and interchanging their land use if the exchange 
increases S. This process is repeated until the satis¬ 
faction reaches a stationary state. 


Calibrating the single parameter 7 , we can repro¬ 
duce the observed K{r)/A and E(D) scaling in the 
real urban areas (see red curves in Figure 5). The 
value of the mixing parameter at which the best av¬ 
erage results are obtained is 7 = 0.08 (Figure S 8 ). 
For comparison sake, we have included a null model 
in which the land use types are distributed at ran¬ 
dom, keeping the real proportions (green curves in 
Figure 5). Unsurprisingly, the null model fails at re¬ 
producing the curves obtained with the data, mainly 
because, generally, areas of a given land use type tend 
to cluster together to form land use zones. More in¬ 
terestingly, assuming that the satisfaction of a cell in¬ 
creases when it is surrounded by cells of its own type 
and that Logistic cells and cells repel each other (i.e. 
7 = 0 ) is also insufficient to reproduce the properties 
observed in the data (for more details see Figure S 8 
in Appendix). Therefore, taking into account the at¬ 
traction between Residential and Business areas seems 
to be crucial for the reproduction of land use spatial 
organization in cities. 


Mixing of land use types 

So far, we have considered that each elementary cell 
has a unique land use type associated. This condition 
can be easily relaxed. If an average activity profile is 
defined for each of the four clusters, a Pearson correla¬ 
tion coefficient between the activity profile in each cell 
and the clusters’ averages can be calculated. The dis¬ 
tribution of correlation values is shown in Figure 6 A. 
The highest correlation value corresponds typically to 
the cluster at which the cell is assigned. Still, in some 
cases, positive correlation values are found for other or 
even two other clusters. For every cell, we can quan¬ 
tify the intensity of its relation with each cluster by 
summing over these positive correlations and normal¬ 
izing by the total. A map of the Barcelona metropoli¬ 
tan area with the intensity of each cell relation with 
its assigned cluster is shown in Figure 6 B. The colors 
represent the four main type of cells and the color sat¬ 
uration is related to the correlation: darker if the cor¬ 
relation is high, paler otherwise. Most cells match well 
with their original assigned cluster, keeping darker col¬ 
ors, while some are brighter, implying a higher level 
of land use mixing. 

We arbitrarily define a cell as mixed when the nor¬ 
malized correlations fall within the interval 0.3 — 0.7 
for other clusters besides the assigned one. The frac¬ 
tion of mixed cells as a function of the city popu- 
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lation is displayed in Figure 6C. Larger cities show 
lower mixing and contain areas devoted to more spe¬ 
cific purposes. Figure 6D illustrates how the land use 
types combine in each cell. Business and Residential 
integrate well together as in our model, increasingly 
so for smaller cities. The mixing proportions are city- 
dependent and act as a fingerprint to characterize each 
urban area. This feature may be used to classify cities 
with similar land use mixing patterns. Besides popu¬ 
lation size, causal links between level of spatial mixing 
and city shape, area, age of the city, function, etc. will 
be explored in future investigations. The mixing pro¬ 
portion can be either related to the organization of 
cities as monocentric or polycentric [28, 55]. Smaller 
cities display a more monocentric structure, which can 
be associated with the mixing of land use types given 
the most restricted space. 

DISCUSSION 

In summary, we introduce a method to automati¬ 
cally detect land use from electronic records and ap¬ 
ply it to the five largest urban areas of Spain in order 
to perform a systematic comparison across them on 
the land use distribution. The urban space is divided 
in a regular grid to prevent geographic heterogeneity 
and to maintain the spatial scale under control. The 
user activity profiles are monitored in each unit cell 
along time, and then a correlation matrix is estab¬ 
lished between the profiles of every pair of cells. This 
correlation matrix encodes the functional network of 
each city. We analyze them by using network cluster¬ 
ing techniques, which ensures that cells showing sim¬ 
ilar use profiles are grouped together. This method 
has been applied to the five most populated Spanish 
cities: Madrid, Barcelona, Valencia, Seville and Bil¬ 
bao. Since the delimitation of urban areas could af¬ 
fect the results, the definition of the municipal trans¬ 
port offices is employed in each case. Interestingly 
given that the method is unsupervised, four groups 
consistently appear as dominant in all cities. They 
correspond to activity profiles compatible with main 
land uses in Residence, Business, Logistics/Industry 
and Nightlife. Not only the types are the same across 
cities, but also the proportions of cells and area de¬ 
voted to each type are similar. 

We also study the distribution of the four land use 
types at different spatial scales. We define the Ripley’s 
K and the entropy index for each land use type and 
the behavior of both metrics is explored as the spa¬ 
tial scale varies from the full city (macroscopic scale) 
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APPENDIX 
Case studies 

In this study, we focused on the five biggest 
metropolitan areas of Spain, Madrid, Barcelona, 
Valencia, Seville and Bilbao (Figure SI). These 
metropolitan areas are very different in terms of sizes 
and populations (Table SI). For all cities we have 
selected as urban area the one served by public trans¬ 
portation (bus and metro) instead of the official defi¬ 
nition that in the case of Seville includes a much larger 
extension relatively depopulated. 


Data pre-processing 

Mobile phone records of anonimyzed users during 
55 days (hereafter noted T) within the period of 
September-November 2009 were aggregated in two dif¬ 
ferent ways. The aggregated data corresponds to the 
number of users per hour and per base transceiver 
stations (BTSs) identified with UTM (WSG84) coor¬ 
dinates. A user may appear connected to more than 
one BTS within a period of one hour. To avoid over 
counting people the following criteria was used when 



Figure SI. Map of the metropolitan areas. 
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Table SI. Summary statistics on the metropolitan areas. 


Metropolitan area Number of municipalities Number of inhabitants Area (kin 2 ) 


Madrid 

27 

5,512,495 

1,935.97 

Barcelona 

36 

3,218,071 

634 

Valencia 

43 

1,549,855 

628.81 

Sevilla 

8 

983,852 

352 

Bilbao 

34 

908,916 

500.2 


aggregating the data: each person shall count only 
once per hour. If a user is detected in k different posi¬ 
tions within a certain 1-hour time period, each regis¬ 
tered position will count as (1 /k) ’’units of activity”. 
From this aggregated data activity per BTS and per 
hour is calculated for each day. In order to compute 
the number of mobile phone users P g ^ d {h) in a grid cell 
g (dimension 0.5 x 0.5 km 2 ) for a day d £ T between 
h and h + 1, where h £ | [0,23] |, we first computed the 
Voronoi cells associated with each BTS. 


Grid cells 

Let n be the number of grid cells, the number of 
mobile phone users N g d(h) (on day d at time h) is 
given by the following equation, V g £ |[l,n]|: 


m . 

N g , d {h)=YA N ^{h)^p L - ( 2 ) 

, —1 


Voronoi cells 

First we remove the BTSs with zero mobile phone 
users and we compute the Voronoi cells associated 
with each BTSs of the metropolitan area (hereafter 
called MA). We remark in Figure S2A that there are 
four types of Voronoi cells: 


Then the set of days T is divided into subsets T w C 
T and the average number of mobile phone users is 
computed for each day of the week w (Equation S3). 


A r g ,w(h) 


I T w \ 


( 3 ) 


1. The Voronoi cells contained in MA. 

2. The Voronoi cells between MA and the territory 
outside the metropolitan area. 

3. The Voronoi cells between MA and the sea 
(noted S). 

4. The Voronoi cells between MA, the territory 
outside the metropolitan area and the sea. 

To compute the number of users associated with 
the intersections between the Voronoi cells and MA 
we have to take into account these different types of 
Voronoi cells. Let m be the number of Voronoi cells (ie 
BTSs), N v ,d(h) the number of users in a Voronoi cell v 
(on day d at time h) and A v the area ofv,v£ | [1, to] |. 
The number of users N vn MA,d{h) in the intersection 
between v and MA is given by the following equation: 

N v nMA, d {h) = N v , d (h ) ( ) (1) 

V Aj — Ans / 

We note in Equation SI that we have removed the 
intersection of the Voronoi area with the sea, indeed, 
we assume that the number of users calling from the 
sea are negligible. Now we consider the number of 
mobile phone users N v d(h) and the associated area 
A v of the Voronoi cells intersecting MA (Figure S2B). 


The average number of mobile phone users for the 
metropolitan areas according to the time and the day 
of the week are plotted on Figure S3. The profile 
curve shows two peaks, one peak around 12AM and an 
other one around 7PM. It also shows that the number 
of mobile phone users is higher during weekdays than 
during weekend. 

N is normalized such that the total number of users 
at a given time on a given day is equal to 1, Equation 
S4, 


Ng 0< w(Ji) 


A go , w (h ) 

S 9 =i Ng iW (h) 


( 4 ) 


This normalization allows for a direct comparison 
between sources with different absolute user’s activ¬ 
ity. For a given grid cell g = go we defined the tem¬ 
poral distribution of users N go as the concatenation 
of the temporal distribution of users associated with 
each day of the week. For each grid cell we obtained a 
temporal distribution of users (also called signal) rep¬ 
resented by a vector of length 24 x 7. It is possible 
that some grid cells have exactly the same signal be¬ 
cause some Voronoi cells may contain several cells, in 
this case the grid cells have been aggregated (Figure 
S2C). 
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Figure S2. Map of the metropolitan area of Barcelona. The white area represents the metropolitan area, the dark gray 
area represents territory surrounding the metropolitan area and the light grey area represents the sea. (A) Voronoi cells of the 
mobile phone antennas point pattern. (B) Intersection between the Voronoi cells and the metropolitan area. (C) Recording 
sites composed of grid cells of dimension 0.5 x 0.5 km 2 . 


Madrid 


Barcelona 


Valencia 






Monday 

Tuesday 

Wednesday 

Thursday 

Friday 

Saturday 

Sunday 


Figure S3. Average number of mobile phone users per hour according to the day of the week for the five metropolitan 
areas. 


Functional network 

Choice of S 

In the method used to extract the functional net¬ 
work from the mobile phone data presented in the 
main text we apply a threshold S to the correlation 
matrix in order to remove the noise and negative cor¬ 
relations from the correlation matrix. Hence, we have 


to choose a value of S high enough to remove the noise 
but not too high in order to preserve the structure 
and the properties of the network. Figure S4 displays 
the distribution of the weights (i.e. correlation coeffi¬ 
cient) for the five case studies. One can observe that 
these distributions can be approximated by a Gaus¬ 
sian distribution. Therefore, we have decided to keep 
only edges with a weight higher than the weight dis¬ 
tribution’s standard deviation. In Figure S4 we note 
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8 Weights 


Figure S4. Number of connected components as a function of 6 (Left) and weight distribution (Right) for the five 
case studies. From top to bottom, Madrid, Barcelona, Valencia, Sevilla and Bilbao. 


Table S2. Statistical properties of the functional net¬ 
works. 


City 

SD 

N 

E 

< k > 

< k > /N 

C 

L 

C r 

L r 

Madrid 

0.42 

1,381 

222,227 

321.8 

0.233 

0.69 

2.04 

0.31 

1.77 

Barcelona 

0.38 

652 

46,573 

142.9 

0.219 

0.62 

2.02 

0.29 

1.79 

Valencia 

0.35 

351 

13,847 

78.9 

0.225 

0.66 

2.06 

0.31 

1.84 

Sevilla 

0.38 

188 

3,700 

39.2 

0.209 

0.62 

2.15 

0.26 

1.81 

Bilbao 

0.35 

267 

8,915 

66.8 

0.25 

0.67 

2.03 

0.39 

1.76 


that for S lower than the weight distribution’s stan¬ 
dard deviation (around 0.4, see details in Table S2) 
the number of connected components is equal to 1. 

Table S2 summarizes the statistical properties of 
the functional networks obtained for the five case stud¬ 


ies. In these tables we can observe the threshold ( SD ), 
the number of nodes (i.e number of cells) (TV), the 
number of edges ( E ), the average degree (< k >), 
the average clustering coefficient (C) and the average 
shortest path length (L). The average clustering co¬ 
efficient C r and the average shortest path length L r 
have been obtained with a randomly rewired network 
preserving the degree of the original network by per¬ 
muting links (4 x (number of edges) times) [56]. We 
observe that the five networks are very similar, charac¬ 
terized by a high clustering coefficient and low average 
shortest path. 
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Figure S5. Contingency tables between the partitions obtained with Infomap and OSLOM for each case study. (A) 

Madrid. (B) Barcelona. (C) Valencia. (D) Sevilla. (E) Bilbao. Each row represents a cluster obtained with Infomap and each 
column represents a cluster obtained with Oslom. The matrices have been normalized so that the sum of each column is equal 
to one. 


Community detection 

Community detection in complex networks has re¬ 
cently been the subject of an abundant literature and 
a large number of algorithms has been proposed the 
last few years. The purpose of these algorithms is to 
identify closely connected groups of nodes within a 
network. To do so, several techniques are used such 
as maximizing the modularity, measuring probability 
flows of random walks or optimizing the local statis¬ 
tical significance of communities. 

In this paper, we have decided to use the Infomap 
method proposed in [48]. Infomap finds communi¬ 
ties by using the probability of flow of random walks 
on the network as a proxy for information flow in the 
real system and then decompose the graph into groups 
of nodes among which information flows easily. As 
shown in [57], this method gives good results, however, 
to evaluate the robustness of the results, the analy¬ 
sis has also been performed with two other clustering 
methods, Oslom [58, 59] and Louvain [60]. Oslom 
is a method based on a topological approach to de¬ 
tect statistically significant cluster whereas Louvain is 
based on modularity optimization which means find¬ 
ing the optimal partition maximizing the density of 
links within clusters and minimizing the density of 
links between clusters. 

In order to compare the partition obtained with the 
different method we have plotted in Figure S5 and S6, 
respectively, the contingency tables between the parti¬ 
tions obtained with Infomap and Oslom and Infomap 
and Louvain for each case study. In these figures, each 
plot represents a contingency table C in which each 
element C Kj is the number of nodes which belong to 
the cluster i detected with Infomap and to the clus¬ 


ter j detected with Oslom or Louvain. The matrices 
have been normalized so that the sum of each col¬ 
umn is equal to one. This normalization allows us 
to study how the nodes belonging to the groups ob¬ 
tained with Oslom or Louvain are distributed among 
the clusters found with Infomap. First, we can observe 
that the number of communities detected with Lou¬ 
vain or Oslom is always greater or equal to the ones 
obtained with Infomap. Indeed, Louvain has detected 
a similar number of clusters whereas the number of 
communities detected with Oslom increases with the 
size of the metropolitan area, from 5 clusters for Bil¬ 
bao to 12 for Madrid. However, it is worth noting that 
in most of the cases, more than 80% of the nodes be¬ 
longing to the Oslom and Louvain’s clusters are gath¬ 
ered in one Infomap cluster. This means that even 
if the size of the partitions are different, we observe 
that clusters obtained with Louvain and Oslom are 
sub-clusters of clusters identified with Infomap. 

Comparison with cadastral data 

In order to validate the results we compared the 
land use patterns obtained with our algorithm with 
cadastral data available on the Spanish Cadastral 
Electronic Site [61]. The dataset contains informa¬ 
tion about land use for each cadastral parcel of the 
metropolitan area of Madrid and Barcelona (about 
650,000 parcels). In particular, we have for each 
cadastral parcel the net internal area devoted to Res¬ 
idential, Business and Industrial uses. We can use 
these data to identify the dominant cadastral land use 
in each grid cell classified as Residential, Business and 
Industrial uses by the community detection algorithm. 
To do so we need to define a rule to determine what is 
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Figure S6. Contingency tables between the partitions obtained with Infomap and Louvain for each case study. (A) 

Madrid. (B) Barcelona. (C) Valencia. (D) Sevilla. (E) Bilbao. Each row represents a cluster obtained with Infomap and each 
column represents a cluster obtained with Louvain. The matrices have been normalized so that the sum of each column is 
equal to one. 


the dominant land use in a cell. Intuitively, one would 
tend to identify the dominant land use in a cell as the 
land use class with the largest area. However, Resi¬ 
dential use is the land use class with the largest area 
in most of the cell leading to an over-representation of 
Residential cells in the metropolitan area. To circum¬ 
vent this limitation we introduce two thresholds 5 Bus 
and 5L 0 g to identify Business and Logistics cells with 
cadastral data. If the fraction of area devoted to Busi¬ 
ness in a grid cell is higher than Sb U s then the grid 
cell is classified as Business. Otherwise, if the fraction 
of area devoted to Logistics is higher than 5Log then 
the grid cell is classified as Industry. Finally, if the 
fraction of area devoted to Business and Logistics is, 
respectively, lower than 5Bus and 5Log then the grid 
cell is classified as Residential. 

Hence, we can adjust the values of these two thresh¬ 
olds in order to obtain a distribution of the fraction 
of cells according to the land use type similar to the 
one obtained with our algorithm. To this end we 
have calibrated these parameters by minimizing the 
L 2 distance between the distribution of the fraction 
of cells according to the land use type obtained with 
the cadastral data and the one obtained with our al¬ 
gorithm for the municipality of Barcelona which rep¬ 
resents 20% of the metropolitan area of Barcelona. 
In Figure S7, we can observe that the minimum is 
reached for 6b us = 0.2 and SLog = 0.2. Now we can 
use these values to identify the dominant cadastral 
land use in each grid cell of the metropolitan area of 



Sbus 


Figure S7. L 2 distance between the distribution of the frac¬ 
tion of cells according to the land use type (Residential, 
Business and Logistics) obtained with our algorithm and the 
cadastral data as a function of Sbus and <5 _l 09 for the munic¬ 
ipality of Barcelona. 


Barcelona and Madrid. 

We find a percentage of correct predictions equal 
to 65% for Madrid and 60% for Barcelona which is 
consistent with values obtained in other studies, 54% 
in [21] and 58% in [23]. Furthermore, for both case 
studies, almost all land use types have a percentage 
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Table S3, onfusion matrix of the classification for Madrid and Barcelona. For the Residential, Business and Logistics rows 
and columns, the value in the i th row and the j th column gives the percentage of grid cells classified as use i by the cadastral 
classification which are classified as belonging to the class j by the algorithm. The Total is the distribution of the percentage 
of cells according to the land use type obtained with our algorithm (row) and the cadastral data (column) with the threshold 
values 5bus = 0.2 and SLog = 0.2. 


Madrid 


Algorithm 
Cadastral 

Residential 

Business 

Logistics 

Total 

Residential 

71.23 

21.67 

7.1 

49.04 

Business 

30.84 

62.33 

6.83 

39.55 

Logistics 

22.9 

32.06 

45.04 

11.41 

Total 

49.74 

40.42 

9.84 

100 


Barcelona 


~—Algorithm 
Cadastral^~~~~—— 

Residential 

Business 

Logistics 

Total 

Residential 

68 

26.55 

5.45 

47.5 

Business 

28.99 

52.17 

18.84 

35.75 

Logistics 

13.4 

32.99 

53.61 

16.75 

Total 

45.77 

37.65 

16.58 

100 


of correct predictions higher than 50% (Table S3). 


Calibration of 7 

The value of 7 was calibrated in order to reproduce 
the evolution of the entropy index as a function of the 
number of divisions by side obtained with the data 
(red line in Figure 4 in the main text). We chose the 
value of 7 minimizing the Euclidean distance between 
the observed values and the average values obtained 
with the model with 100 replications. The best results 
have been obtained with the value 7 = 0.8 (Figure S 8 ). 
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Figure S 8 . Results obtained with different values of 7 (7 = 0, 7 = 0.8 and 7 = 1 ) and T = 500, 000. The 2D lattice used to 
represent the urban space is composed of 50 x 50 = 2,500 cells. The model seems to converge after 300,000 iterations but 
to ensure the convergence all the results shown in the paper were obtained with 500,000 iterations. 
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Figure S9. (A-B) Geographical location of the clusters for Madrid and Barcelona. (C-D) Temporal patterns associated with 
the four clusters for both metropolitan areas. In red, Residential cluster; In blue, Business; In cyan, Logistics; And in orange, 
Nightlife. 
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Figure S10. (A), (C) and (E) Geographical representation of the communities for Valencia (A), Sevilla (C) and Bilbao (E). 
(B), (D) and (F) Temporal patterns associated with the communities for the metropolitan area of Valencia (B), Sevilla (D) and 
Bilbao (F). In red, the Residential community; In blue, the Business community; In cyan, the Logistics/Industry community; 
In orange, the Nightlife community. 
























































